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Coarse- Graining (CG) models are low resolution approximation of high resolution models, such 
as all-atomic (AA) models. An effective CG model is expected to reproduce equilibrium values of 
sufficient physical quantities of its AA model, which requires to match the equilibrium probability 
density of the CG model to that of the AA model in conformational space. The present work 
proposes for constructing effective CG models a novel methodology that aims at minimizing the 
distance between CG model and AA model. The distance is defined as a functional of conformational 
probability densities in CG and AA models and further expanded by ensemble averages of a set 
of sufficient and independent basis functions. An orthogonalization strategy is adopted to get 
the independent basis functions from sufficiently preselected interesting physical quantities of the 
system. Two variational methods are developed to optimize parameters of effective CG force field 
by minimizing the functional of probability densities, are then generalized so that the CG model 
also reproduce the pressure of AA model. The general CG framework is verified in constructing 
one-site CG water from T1P3P water model. 



I. INTRODUCTION 

Two branches of computational physics, namely 
macro-scale continuous fluid dynamics and micro-scale 
molecular dynamics simulation, never stop efforts in ex- 
tending their power to wider scales. For example, molec- 
ular dynamics simulation has been successfully applied 
to model mesoscopic three dimensional Rayleigh-Benard 
convection [l| and by the same token fluid dynamics has 
been applied to simulate nano-scale convection behavior 
in electrochemical reactions Q. However, even though 
the proliferation of multiprocessor computers and par- 
allel computation techniques have inspired the ambi- 
tion of computer scientists, molecular dynamics simu- 
lation is still an torment for most large scale systems 
like macromolecules' self-assembling problem whose size 
is micrometer scale comprising of hundreds of millions 
of atoms and whose phase transition time is longer than 
100 nanoseconds Q. In this circumstance coarse-grained 
(CG) method, as a promising way, and in some situations 
the only wayJU, of bringing molecular dynamics simu- 
lation into large applications on multi-/zs time scale or 
multi-/xm length scale has therefore been discussed exten- 
sively recent years [USUI- Different from all- atom (A A) 
molecular dynamics, CG method usually subsumed high 
frequency intra-molecular vibrations into coarse grained 
sites. The low resolution model is then simulated in 
a carefully designed effective force field. This proce- 
dure has been proved available for protein dynamics Q, 
protein-membrane interaction simulation as well as 
protein folding 0. 

Usually, the construction of a CG model includes three 



steps. The first one is to select the mapping from A A co- 
ordinates to CG coordinates, simply denoted as x = x(q). 
Here q and x represent the high-dimcntional conforma- 
tion vectors in AA and CG models, respectively. For 
example, while q is the position vector of all hydrogen 
and oxygen atoms of waters, x can be chosen as the posi- 
tion vector of the center of mass of water molecules, thus 
x{q) is a linear function of q. It is also possible to design 
another mapping, for example, if selecting the spatial 
density of molecules, p(R), as the coarse-grained coor- 
dinates, x, where R is the usual 3— dimensional spatial 
vector, we may bridge molecular systems to macroscopic 
continuous field models. The second step is to presume 
a formula as effective force field of CG model with some 
free parameters, U (x; w 7 ), e.g., a pair additive interaction 
in one-site CG water model, U{x; u 7 ) = u i r ij)- Here 
values of the pairwisc potential function u(r) at different 
pair distance r are free parameters, is the distance 
between the ith and jth CG sites. If p(R) is chosen as 
the x in continuous field model, U (x; it 7 ) might be a den- 
sity functional as U = Jp(R)w(R,R')p(R')rfRrfR / Q3] 
with the kernel w(R, R') as parameters. The final step 
of constructing CG models is to determine the free pa- 
rameters m 7 by minimizing the difference between CG 
model and the referenced AA system. While the first 
two steps are usually set based on afore experience and 
knowledge of the studied systems, many different coarse- 
graining techniques have been developed to focus on op- 
timizing U{x;u 1 ) in the m 7 parameter space, based on 
different measurement of inter-model distance. A quite 
recent review is shown in reference [ll| . Among of them, 
the traditional coarse-graining methods calculate the dis- 
tance based on equilibrium averages of some chosen phys- 
ical quantities, such as the radius distribution function 
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(RDF) of CG sites [Tg, 

Dtrad = / [g cg (r) - 9aa(r)} 2 p{r)dr, 



(1) 



where g cg /aa( r ) is the ensemble averaged RDF of CG 
sites in the CG/AA model, and p(r) is an optional weight 
function. The traditional CG method has already been 
widely applied in various systems [H, [13, [H, fill. Two 
known algorithms, iterative inverse Boltzmann [l7[ and 
reverse Monte Carlo [l8[ are applied in the traditional 
CG techniques to optimize CG force field. The itera- 
tive inverse Boltzmann method, introduced by Reith et 
al. [l7| iteratively adjust inter-particle interactions based 
on an idea gas assumption which declares the RDF of 
CG sites is equal to the exponential of the pairwise po- 
tential function, explicitly g(r) = exp[— f3u(r)] for the 
pairwise potential function u(r) and the equilibrium av- 
eraged RDF g{r) of CG sites. The assumption can be 
exact only in the low density limit. But fortunately 
the iterative inverse Boltzmann has been proved to be 
valid for dense systems like Lennard- Jones liquid [l7j], 
TIP3P water [l9[ and polymer systems [l?], Sim- 
ilar to the iterative inverse Boltzmann method, the re- 
verse Monte Carlo (RMC) method [l8[, also consists of 
iterative adjustment of interactions, only that modifica- 
tion to potential function in each RMC iteration cycle 
is determined by the steepest descent algorithm, which 
calculates the derivatives of the equilibrium averages of 
the chosen physical properties to update the u(r). RMC 
has been used to determine ion-ion interaction in aque- 
ous NaCl solutions to study diffusive dynamics of 



liquid water 21| , to model behavior of mcsoscopic lipids 
and lipids assembly in water [22j and to study bilayer 
membrane [23j . 

Obviously, the optimized CG model based on the tradi- 
tional CG method is dependent on the selection of phys- 
ical quantities. To avoid (or depress) the dependence of 
estimating D tra d on the selection of physical quantities, 
an alternative CG method is to match whole the free en- 
ergy surface of the AA model in the CG conformational 
space [24j|, which is defined as, 



F(x) = -k B T\n J cxp[-(3V(q)]5{x - x(q))dq, 



(2) 



where fcs is the Boltzmann's constant and T is tem- 
perature, (3 = 1/ksT. V(q) is the potential energy 
surface of the AA system, 8Q is the Dirac-<5 function. 
Thus the distance between the CG model and AA system 
could be defined as Df e = ([AU(x)] 2 ) p , where AU(x) = 
U{x; u-y) — F(x) and p(x) is an optional probability den- 
sity. By calculating values of the free energy F(x) at 
many sampled CG conformations, \x l \,i = 1, • • ■ , M, 
based on a jump-in-sample algorithm [24j . we estimated 
the distance as Df e = -y ^JAET^x*)] 2 and optimized ef- 
fective CG force field of tetrahedral molecular liquids [24| ■ 
The free-energy matching method is expected to better 
take into account the overall characteristic of the AA sys- 
tem in comparison with the traditional CG methods [24j, 



however more computational costs are usually required 
in the free-energy matching method than the latter. Re- 
cently, by matching the total AA force on CG sites, a 
new CG method, force matching method [25|, [26|, has 
been presented and widely tested in various of systems 
(see [ll| and references therein). The force matching 
method closely related to an improved variant of the 
free-energy matching CG method. Replacing to the very 
time-consuming calculation of {F(x 1 )}, we may calcu- 
late the gradients of free energies, ^\ x - x %, for exam- 
ple, based on the blue- moon ensemble simulations [27| . 
and define the distance between CG and AA models as 
D feg = jjT,i\m AU ( xi )\ 2 t0 optimize CG force field. 
While the CG mapping function x = x(q) is linear, in 
principle, the force matching method can give the same 
results of the CG method by matching the gradients of 
free energy. The force matching CG method (or the 
more general free-energy-gradient matching method) sig- 
nificantly decreases the required computational costs in 
comparison with the free-energy matching method, but 
similarly keeps the accuracy of the latter. 

In the work we present a new methodology to optimize 
effective CG force filed based on matching the equilib- 
rium probability density of AA system in the CG confor- 
mational space, named as distribution matching method. 
The ratio of the probability density function in CG model 
to that in AA model is expanded on a set of complete and 
linearly independent conformational functions (i.e., basis 
functions) . We find the equilibrium fluctuation of the ra- 
tio of probability density functions accurately measures 
the distance between CG and AA models, since the fluc- 
tuation provides an upper limit of the error of reproduc- 
ing AA equilibrium average of any conformational func- 
tion in the CG model. The distribution matching CG 
technique has the almost similar computational cost as 
the traditional CG method but takes into account the 
overall characteristic of the free energy surface F(x) of 
AA systems, so it has the advantages of both the tra- 
ditional CG method and the free-energy-based methods 
(matching free energy or its gradient or force). 

This article is organized as follows, in Sec. UH we first 
introduce the theories on definition of consistency be- 
tween CG force field and A A force field, and then give 
the formulas of distribution matching inherited from con- 
sistency condition. After, details of an expansion of dis- 
tribution function are outlined, two ways of applying the 
expansion to the distribution matching method are then 
present and compared in this section. Sec. lIIII starts from 
an application of the distribution matching method for 
water model. 



II. THEORY AND METHODS 

While groups of atoms in a AA system are mapped as 
CG sites, i.e., x = x(q), the interaction among CG sites, 
U(x;Uy), need to be optimized to minimize the differ- 
ence between CG model and the AA system. It possibly 
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requires U(x;u 1 ) matches the free energy surface F(x) 
in whole the CG conformational space x, or more prac- 
tice, in important (and interesting) CG conformational 
regions of x. We define a weight function as ratio of two 
probability density functions, 



?0) 



. Pcg(x) 
~ Paa(x) 7 



(3) 



where P cg /aa{ x ) is the equilibrium probability density of 
the CG/AA model in the x space, for example, P cg (x) oc 
e -pU(x;u y ) anc j K e -/3F(x) m canomca j ensemble. 

The weight function io aa ^ cg (x) oc exp{— j3\U(x\ u 7 ) — 
characterizes the difference of the CG model from 
the AA models. Considering the fact 



(x)A(x)) aa = (A(x)) 



(4) 



for any conformational function A(x), the weight func- 
tion can be expanded as a scries of complete basis func- 
tions A»(x) [H], 

(aa) (8 aa A»(x)) cg 5 aa A v {x). (5) 

Here (• • • ) aa /cg denotes the ensemble average in the 
AA/CG model, 5 aa A»(x) = A»{x) - (A»(x)) aa , g^(aa) 
is the inverse covariance matrix of the basis functions 
which obey the equality, g\^(aa)g^ '(aa) = 5\ while 
g^(aa) = (S aa A^(x) 8aaA v (x)) aa is the covariant matrix 
of the basis functions in the AA model. Wc used Einstein 
summation notation for Greek superscript or subscript 
here and in the text below of the paper if without explicit 
expression. The fact (w(x)) oa = 1 is explicitly wrote out 
in Eq. (|5|). We can define the difference from the AA 
model to CG model as the equilibrium fluctuation of the 
weight function in the AA model, 



+cg — ( UJ aa-+cg( x )) ' aa 1 

= 9nu{aa) (SaaA^cg (8 aa A") cg , 



(6) 



because s aa ^ cg gives the upper limit of the relative error 
of the ensemble means of any variable A(x), 

e = \{A(x)) cg - (A(x)) aa \ = |((w - l)S aa A) aa \ 



< y/{u 2 )aa - ly/((8 aa A»)*) c 
= Saa^ctj CT\A ), 



(7) 



where a(A^) is the fluctuation of A fM (x) in the AA sys- 
tem. It is worth to mention that s aa ^ cg is not a normal 
distance, since it is not symmetric about aa and eg, i.e., 



^ So 



It is not difficult to define a sym- 



metric distance, such as, D 2 (aa,cg) = 4((tu'p^ cg )p — I) 
where P(x) = [P aa (x) + P cg (x)]/2 is the average prob- 
ability density and ujp_ >c (x) 



For theoretical 



+cgV*V — P ( x ) 

view points, it might be more robust to use the symmet- 
ric distance D 2 (aa,cg) in optimization of CG effective 
force field. However, the minimization of D 2 is similar 
to that of s 2 a ^ cg except needing slightly more computa- 
tional cost in comparison with the latter. In the current 



paper, we only use the s 2 a ^ cg to illustrate the optimiza- 
tion of CG force field, it is direct to extend the method 
based on the symmetric distance. Actually, the only dif- 
ference among the three distances is that the inverse co- 
variance matrix g^ are estimated in different equilibrium 
conformational samples, such as s 2 a _^ cg = g fl „(aa)a fl a 1 ' , 
s l g ->aa = 9^v{cg)a^a v and D 2 (aa,cg) = D 2 (cg,aa) = 
9flv (P)aW. Here a^ = (A») cg - (A») aa . 

While the basis set A fl (x) is complete, Eq.([5]) is exact 
and independent on the applied CG potential U (x) , thus 
we have an analyzed expansion of free energy surface in 
(any) high-dimensional x space, 

F(x) = U{x) + k B T\n[l + 9llv (aa)a» 5 aa A v (x)] 

= U(x) - fc B Tln[l - g^{cg)a» 6 cg A v (x)}. (8) 

However, in practice, we only can use a finite-size basis 
set (and finite-size AA/CG equilibrium samples) in the 
expansion, thus the obtained F(x) may have significant 
errors at somewhere of the x space unless the dimension 
of x is small and/or U(x) very closes to F(x). Although 
the expansion can not be expected to be accurate in ev- 
ery where of x space, s 2 is able to be estimated very well 
from Eq.®, if many (but not very large number of) ba- 
sis functions are applied in the expansion. We can chose 
interesting physical quantities of the system as much as 
possible to capture the difference between the AA to CG 
models. The linear correlated basis functions are auto- 
matically discarded by 3^^(00) in Eq.®. In addition, 
since we estimate g^icia) in a finite-size conformational 
sample of the AA system, the number of independent 
basis functions is not more than the size of the sample. 
Actually, unless the difference between AA and CG mod- 
els is very huge, the number of required basis functions 
is far smaller than the size of sample in estimate of s 2 . 
More detailed discussion about the completeness of basis 
function in estimate of s 2 is discussed in the reference [29[ 
where we applied the same expansion in structuring and 
sampling conformational space of complex systems. 

Now the difference between two models (actually, two 
finite-size equilibrium conformational samples) is calcu- 
lable, the only thing left for constructing CG models is to 
minimize s 2 by consecutively adjusting the CG force field 
U(x, u-y). If any one of basis functions is linearly depend 
on other basis functions, the covariant matrix g^ u will not 
have a full rank, then the inverse covariant matrix will 
be singular. In this situation, the basis functions can be 
rearranged so that linearly dependent functions can be 
eliminated. To do this wc orthonormalizc basis functions 
in the way that A^ = C£8 aa A v where {^} M =i,2,... is a 
set of orthonormalizcd functions that have a diagonal- 
ized covariant matrix (A^A l ') aa whose diagonal elements 
is either zero (or very small) or unity. The orthonormal 
functions A^ with zero (or very small) eigenvalues can 
be safely cast away in that they are linearly dependent 
on the other basis functions. The distribution difference 
then becomes s 2 = J2 u ((A tl )cg) 2 ■ 
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A. Fitting pairwise additive central potential 

If the effective potential of CG model is pairwise addi- 
tive and central, we have, U(x; u 7 ) = Yl^j f{ r ij)- Here 
Tij = |rj — Yj\ represents the distance between the zth 
and jth CG site. The pair-central force can be gener- 
ally expressed in terms of a table of parameters, i.e., 
i P{ r ) = Tln=i — r7 ) u 7j m which N 1 is the total num- 
ber of tabular parameters, 8{r) is a double-step function 
that 5{r) = 1 when — Ar/2 < r < Ar/2 and is other- 
wise (in practice, we used a continuous version of the 5(r) 
so that its first-order and second-order derivatives exist 
everywhere). The gradient of s 2 is given as following, 



where f 1 {x) is defined as 
_ dU{x) 



(9) 



(10) 



and 5 cg f 1 {x) = f^(x) — (f 7 ) cg . The total energy in terms 
of potential parameters becomes U(x;Uj) = ii 7 / 7 (x), 
and / 7 (x) is actually the number of the CG-particle pairs 
with the pair distance in the interval (r 7 < r < r 7 + Ar) . 
There are lots of local optimization methods can be used 
to minimize s 2 , such as the steepest descent method, the 
conjugate gradient method, etc. It is also possible to 
directly update U{x\u 1 ) from Eq. ([5]). Specifically, the 
basis functions in the right side of Eq. (O can be replaced 
by the orthonormalized basis A M , the distribution differ- 
ence in the left side can be re-expressed in terms of tabled 
potential parameters u 7 , further, by taking natural log- 
arithm of both sides of Eq. ([5]) , it yields 



/3^/ 7 A W7 =lnp(x)] 



(11) 



where u){x) = lo(x), if cu(x) which is estimated from 
Eq. is larger than a presumed small positive value 
u> € , but uj(x) = u> e if lu(x) < uj e . Here uj e is applied to 
make sure the left side of Eq. (jTT|) be a real number. In 
this paper, we set lj £ = 0.001. As we mentioned, due 
to the incomplete basis functions in Eq. ([5]), it is possi- 
ble the estimated uj(x) is very closed to zero or even be 
negative at some x, although the exact lo(x) should be 
positive anywhere in principle. Therefore, we are able to 
correct the tabular potential parameters of U(x]u 1 ) in 
an iteration process by estimating Au 7 from the linear 
equation below, 

~f3Y,(f X P)xA Ul = (f x HCo(x)]) x . (12) 



hybrid sample which consisted of equilibrium conforma- 
tions in both AA and CG models. Starts from Eq. ^ or 
Eq. p2[) . an effective CG potential can be obtained by 
following algorithms: 

1. Choose an initial guess of potential function it 7 . 
For many CG models the logarithmic RDF is a 
good start point, 



if = -fc B Tln ff (r 7 ), 



(13) 



where g(r 7 ) is RDF at the pair distance r 1 of CG 
sites in the AA system. 

2. Run a MD simulation under the potential with the 
parameters formed in the ith iteration, Uj, calcu- 
late values of / 7 (x) and the orthonormalized basis 

functions at the sampled conformations, and 

d 

Da-, 



estimate s 2 (and its gradient V 7 s 2 = gf-s 2 ). 



3. Find the correction Au^ based on the direct it- 



eration method described in Eq. (|T2|) . the conju- 
gate gradient method or another local optimization 
methods, the parameters of CG potential for the 



next iteration is u 



(i+l) 
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4. Do aforementioned two steps until s 2 is smaller 
than a preselected threshold, which could be de- 
termined by analyzing statistic error of s 2 . 



B. Pressure Correction 

However, while the ensemble average values of any con- 
formation function A(x) in the AA model can be approx- 
imately obtained from {A(x)) cg with a relative error not 
more than s, some interesting physical quantities, such 
as pressure, can not be written as the ensemble average 
of a common conformational function in the CG and AA 
models. Actually, the microscopic function of pressure 



- Nk B T 1 dW(x) 



(14) 



has explicit dependence on potential energy surface 
W(x). Here V = L d is the volume of the d-dimensional 
simulation box with length L, and W(x) is dependent 
on L while scaling the real conformation x as the dimcn- 
sionless one z = x/L. For example, given the assumption 
that the force filed of CG model is pairwise central addi- 
tive and d = 3, the pressure is thus able to be expressed as 
an integral over RDF [l(| and further can be discretized 
in terms of potential parameters, 



The angular bracket with subscript X means the ensem- 
ble average in an optional conformational sample, such 
as the equilibrium sample in t AA or CG model, or any 



2npN f 00 2 , . 

~^y- J [3r g(r; x) + r g (r; x)\u{r)dr t - 



T 7 (x)u 7 + pk B T, 



P 
(15) 
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where p = N/V is the number density of CG particles, 
g{r;x) is value of RDF of special CG conformation x at 
inter-particle distance r, g'(r; x) 



dgjrjx) 
dr 



and 



2-rrpNAr 
W 



[3< ff (r 7 ;a;)+< g '(r 7 ;x)]. (16) 



Here we already suppose r 3 g(r;x)u(r) = while r = 
and r — » oo. It is clearly, V cg (x) is explicitly depen- 
dent on the parameters it 7 of the CG force field, thus 
the macroscopic pressure, V cg = ('Pcg{%)}cgi n °t only de- 
pends on the equilibrium distribution of CG, but also 
explicitly depends on the potential surface, U(x;u\), it- 
self. Thus, the error of the macroscopic pressure, e-p — 

\{Vaa{q))aa ~ (V cg (x)) cg \ IS, 



I {Paa P eg) aa H~ {Pcg}aa {Peg) cg\ 



(17) 



While the second term in the right side of Eq. (fT7| is 
limited by the s aa _> cff , the first term might be large since 
the microscopic pressure V has different dependence on x 
in CG model from that in AA system. For reconstructing 
the pressure in CG model, extra efforts should be taken. 
One simple way is to directly use a penalty function to 
define a pseudo distance, 



+ a[(P cg {x)) cg -(f aa {q)} 



2 

aa\ j 



(18) 



where a is a positive constant. The gradient of the 
pseudo distance now becomes 

ds 2 ds 2 _^ d 

= du^ 9 + 2a ^ Vc ^ c 3 ~ (Vaa)aa]-^-(V cg ) cg , 

(19) 



where 
d 



(V cg ) cg = -(3{S cg P{x) V cg (x)) cg + (TWcg. (20) 



With Eq. (|18p and Eq. (fl9|) , we can minimize s 2 P by using 
usual local minimization methods, such as the conjugate 
gradient method. In Eq. (fT8|) . the penalty coefficient a 
can be any positive number in principle. However, it is 
possible to make the pressure correction more consistent 
with the distribution matching scheme by defining the 
value of penalty coefficient in following way: First, recall 
the expansion expression given by Eq. (|6|), if 5 aa A^ are 
orthogonal to each other but not necessarily have been 
normalized, then Eq. ([6]) can be re-expressed as s 2 — 
E^^aaA^lg, where 5mm = {(6 aa A^) 2 )^ . Now if we 

add a hybrid basis function A aa T>(x) = V cg {x) - (fi aa )aa 
in the basis function set and neglect the correlation be- 
tween the function and the rest of basis functions, we 
are able to give an appropriate value for the penalty co- 
efficient, that is the reverse variance of P aa in the AA 
system, i.e., 



l&aa(q))aa ~ (P aa{q))U 



(21) 



It is possible to also consider the correlation of A aa V(x) 
with the other basis functions to further improve 
Eq.([18]). In the paper, we only simply use Eq.([THJ) to il- 
lustrate the possibility of involving pressure matching in 
optimization of CG models, more discussions will appear 
elsewhere. 



III. TEST CASE: ONE SITE WATER MODEL 

In this section, we use the distribution matching 
method to construct an effective (CG) force field for one 
site water model. We choose the TIP3P water model [30(] 
as the AA model because of its simplicity. The illus- 
tration of applying the distribution matching method 
to optimize effective CG force field is not dependent on 
the selection of higher-resolution model, for example, we 
even can construct the one-site water from ab initio sim- 
ulations of water. The TIP3P water model treat wa- 
ter molecular as a rigid three sites model, and its force 
field involves only nonbounded interactions, i.e., the elec- 
trostatic interactions and the Lcnnnard-Jones potential. 
The TIP3P simulation of water was carried out in the 
constant NVT condition, where the number of water 
molecules TV = 216 in a cubic simulation box with the 
volume 6.4585 nm 3 (corresponding the density of water 
as 1.0 g ■ cm -3 ) at the temperature 300 K. In one site 
CG model, the water molecules are replaced by a spher- 
ically symmetrical site mapped by the center of mass 
of water molecules with the mass 2ttih + mo- Both 
the CG model and the TIP3P model are simulated by 
the MD package NAMD2.5 [H[ with our modification 
for coarse-graining MD simulations. In all simulations, 
the Lennnard- Jones potentials are gradually switched off 
from radius r switch = 0.8 nm to r cutoff = 0.9 nm, 
the electrostatic force is calculated based on particle 
mesh Ewald method [H|. Langevin thermostat with 
the a damping coefficient of 5.0 ps _1 is used; a length 
20ns trajectory was generated from TIP3P simulation, 
and conformations are collected in a frequency of ev- 
ery 0.4 ps after the first 10 ps simulations for sake 
of thermal equilibrium. The CG simulations were per- 
formed in the same condition with the TIP3P simulation 
but the length of trajectory at each parameters u 7 is 
as short as 800 ps with a sampling interval of 0.2 ps 
after droping first 10 ps trajectory. The gir^x) at 
Tfj, = r rn i n + /uAr , fj, = 0, • • • , 99 are chosen as the 
basis functions for the expansion described in Eq. 
Ar = 0.008 nm and r m i n =0.1 nm. It is directly to use 
more basis function, such as multiple-body correlations, 
local orientational orders, in Eq. ([6]) to obtain more ro- 
bust CG force field, the weights of all these preselected 
physical quantities in s 2 are automatically provided by 
estimating g M „. As more and more physical quantities 
are applied, the s 2 will reaches saturation and does not 
change any more even more physical quantities are ap- 
plied [1^]. In the paper we only apply the RDFs to fo- 
cus on illustrating the distribution matching CG method. 
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Without loss any generality, these values of RDF are first 
scaled then are applied as the basis functions, i.e., 



6aaA»( X ) 



gjr^x) - (g{r^x)) aa 
Oaaigir^x)} +e A 



(22) 



where a aa [g{r^x)) = y / (g 2 {r;x)) aa - (g(r;x))l a is fluc- 
tuation of g(r M ; x) in the TIP3P model, ea is a small pos- 
itive value that makes sure the denominator is nonzero. 
The constant scaling factors, o- aa [g{r^\x)\ + £a for dif- 
ferent fi, makes all basis functions have the same order 
value so that the calculation of the corresponding ma- 
trix g^ v has better numerical stability. Inherit but differ- 
ent from the idea of orthogonalization protocol proposed 
in last section, which be also seen in the reference [28| . 
we calculate the inverse covariance matrix by factoriz- 
ing g^ v . Because g^ u is positive definite and symmetric, 
it has all positive eigenvalues and can be decomposed 
to a form g^" = V T DV where V is a unitary matrix 
whose rows are orthonormal, D is diagonal matrix whose 
diagonal elements are eigenvalues of g^ v . In practice, 
some of the basis functions can be linearly dependent on 
the others, therefore the covariance matrix g^ v has some 
zero eigenvalues and in that case D irreversible. How- 
ever if we choose row vectors of V as basis functions 
and desert those with zero length, we are able to build a 
non-degenerated functional space. Noticing that desert- 
ing eigenvectors are the same as truncating eigenvalues. 
We define the inverse matrix of D as: 



1/D« ifDi 
ifD,. 



>e D 
< e D 



(23) 



Here Ed is a threshold for truncating eigenvalues, in our 
case, sd is set to 0.001. Then, the inverse covariant ma- 
trix writes as g^ u = V T D _1 V. An alternative method to 
form g^y = V T [D + ££)I] -1 V, where I is the unit matrix. 
We starts from potential function defined in Eq. (fl"3"J) and 
minimize the s 2 with both the conjugate gradient method 
and the direct iteration method described by Eq. (fT2")) . 
Both of the methods work effectively except for the fact 
that the direct iteration method minimizes s 2 slightly 
faster than the conjugate gradient method in the case. 
As demonstrates in Fig. [TJ when using the direct iter- 
ation method s 2 decrease from 12.0 to 0.01 within the 
first 10 iteration steps, on the other hand, the conjugate 
gradient method need more than 15 iteration steps to de- 
crease s 2 from 11.9 to 0.3. When the distance is smaller 
than 0.001, RDF, which we used to construct the basis 
functions, of referenced TIP3P model can be reproduce 
by CG model remarkable accurately, as demonstrated in 
FigH 

Former researchers [13, [l9| used to correct CG force 
field by adding a linear item to the effective pair potential 
to reproduce pressure of AA system in CG model. We de- 
clare that the linear correction is not the best compromise 
to revise pressure while having reasonably small effect on 
the accuracy of structure properties' reconstruction. The 
effective pair potential of CG before and after the pres- 
sure correction described by Eq. I|18p was shown in Fig. [3] 



Correspondingly, the evolution of s 2 P is shown in Fig. [1] 
RDF after the pressure correction is also shown in Fig. [5] 
as comparison. Fig. [3] clearly demonstrates that the pres- 
sure correction has change the pair potential more in the 
large r region, i.e., 0.6 ~ 0.7 iito, than in the small r 
region, i.e., 0.2 ~ 0.5 nm. In Fig. H s% = 0.084, but 
if take off the pressure penalty item the residue becomes 
s| P = sp — a(V cg — Vaa) 2 — 0.026, which is insignificant 
comparing to s 2 (20). 

As for the one site water model, as shown in Fig. 2J 
the variance of pressure in the AA system, a(S aa P) = 
818.1bar, which is equal to 0.012 kCal ■ mol^ 1 ■ A~ 3 , we 
use l/a 2 (S aa V) « 7200 mo^ 2 • A 6 ■ kCal~ 2 as the value 
for a in the pressure correction. We carried on the pres- 
sure correction after s 2 had decreased to 0.014. The evo- 
lution of s 2 P is presented in Fig. [TJ It is worth noticing 
that in the first four steps, minimization of s 2 AP makes 
concession to that of s 2 P and increase from 0.014 to 0.3 
due to the fact that pressure difference is too large. How- 
ever, from the fifth step on, both of them can decrease 
smoothly. This is largely because pressure difference is 
numerically independent on RDF differences. Correction 
to the potential at this moment is roughly divided into 
two parts, one is the fine tune which add short wave 
length modification function Aip(r) to pair interaction, 
say tp(r); the other part is the global shift which change 
the potential function significantly while holding the fine 
structure. 

One of the well known dilemmas in CG techniques is 
that the pressure correction would lead to less accuracy 
in representing of other properties [ijl HH, dH, [34[ such 
as isothermal compressibility and conformance of RDF. 
This effect is shown in Fig. ©: when pressure consis- 
tency is taken into account, s 2 increases from 0.001 to 
the value of 0.026. However on the other hand, CG with 
pressure correction shows greater extensibility than that 
without pressure correction at temperature T = 370 K 
and T = 230 K. It indicates the additional pressure cor- 
rection makes the CG model be more consistent with the 
AA system. Another appealing feature Fig. ([5]) shows 
is that in the temperature region of T e // = {T|290 < 
T < 312} , s 2 P < 0.04 and s 2 < 0.017, both of which are 
very small value. Especially that in T e ff, s 2 is smaller 
that the best case of s 2 p . Since the RDF corresponding 
to s 2 P = 0.026 is almost indecipherable, we can safely de- 
clare that T e f f is the effective temperature region for the 
potential obtained from our method. 



IV. CONCLUSION 

The present work introduces a new methodology, the 
distribution matching method, to optimize CG force field 
effectively and efficiently. Consistency condition between 
CG and detailed atomistic model is given and reinter- 
preted as a requirement of matching of distributions thus 
equilibrium average of a set of sufficient and independent 
basis functions through the distribution expansion anal- 
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ysis. Based on the analysis, wc proposed a two steps 
protocol to construct effective CG force filed. The first 
step is to expand phase spaces differences between CG 
and AA model as linear combination of basis functions. 
In this part orthogonalization technique is suggested to 
avoid singularity led by linear dependencies among ba- 
sis functions. The second step is to minimize the de- 
fined difference in whole the conformational space. Two 
different minimization approaches are introduced in this 
part, namely the conjugate gradient approach and the di- 
rect iteration approach. Both two approaches are demon- 
strated in the case of being applied to fit pairwise additive 
force field. With aforementioned two steps protocol we 
have been fully able to construct effective pair additive 
force field for CG models. We test this statement by ap- 
plying the formulas to building one site model for TIP3P 
water model. Considering that pressure consistency is 
required in some situations we propose a method to cor- 
rect pressure in accordance previous two steps protocol. 



It is enlightened by constraint optimization techniques. 
By adding a penalty item to the definition of phase space 
difference we are able to limit the pressure deviance. The 
effectiveness of pressure correction method is also verified 
in one-site CG water model. The present methodology 
is encouraging, its capability to reverse distributions of 
arbitrary functions defined in CG coordinate space indi- 
cates a wide applications in multi-scale simulations. 
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Optimizing Steps 

FIG. 1: Evolution of s 2 and s 2 P in the first 20 iteration steps 
using different optimizing algorithm. In the figure, the sub- 
script notion conj refers to the conjugate gradient method; 
direct means the direct iteration method (see text). At the 
20th step, s 2 (20) is 0.3 with the conjugate gradient method, 
is 0.007 with the direct iteration method, meanwhile Sp(20) 
is 1.4 and s 2 AP = 0.16 




FIG. 2: Comparison between center of mass RDF of TIP3P 
water model and site-site RDF of one-site CG water model. 
Both simulations were performed under the constant NVT 
ensemble at temperature of 300 K and density of 1.0 g/cm 3 . 
The inset shows an enlarged region of the first peak. The 
s 2 between the CG model without the pressure correction 
(CG-NPC) and the TIP3P model is 0.001, and s% between 
pressure corrected CG model (CG-PC) and TIP3P model is 
about 0.084. 
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FIG. 3: The effective pair potential for CG with and without 
pressure correction; the inset shows the difference between the 
two. s 2 and s 2 P of the two potentials are the same with those 
of Fig. m 



0.04 




Pressure [Pa] 

FIG. 4: This figure shows pressure distribution for, from left 
to right respectively, all-atom model (triangle line), coarse- 
grained models with (circle line) and without (square line) 
pressure correction. The average pressure of these distri- 
butions are, accordingly, —184.3 bar, —64.5 bar and 1.08 x 
10 4 bar, resepctively. 
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FIG. 5: Comparison of extensibility of pair potentials on dif- 
ferent temperatures. The initial potentials is obtained in 
T = 300 K from different methods, using versus not using 
pressure correction in particular, and then is tested in other 
temperatures from 230 K to 370 K. The results are then 
compared to TIP3P simulations at the same temperature so 
as to calculate the free energy distances (s 2 ). The circle line 
shows the dependence of s 2 on temperature for non-pressure 
correction pair potential, its value at T = 300 K is 0.001; the 
triangle line show the results of the potential with pressure 
correction, its minimum value is 0.026. 



